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Abstract — n this paper we propose minimum mean squared 
error (MMSE) iterative successive parallel arbitrated decision 
feedback (DF) receivers for direct sequence code division multiple 
access (DS-CDMA) systems. We describe the MMSE design 
criterion for DF multiuser detectors along with successive, 
parallel and iterative interference cancellation structures. A novel 
efficient DF structure that employs successive cancellation with 
parallel arbitrated branches and a near-optimal low complexity 
user ordering algorithm are presented. The proposed DF re- 
ceiver structure and the ordering algorithm are then combined 
with iterative cascaded DF stages for mitigating the deleterious 
effects of error propagation for convolutionally encoded systems 
with both Viterbi and turbo decoding as well as for uncoded 
schemes. We mathematically study the relations between the 
MMSE achieved by the analyzed DF structures, including the 
novel scheme, with imperfect and perfect feedback. Simulation 
results for an uplink scenario assess the new iterative DF 
detectors against linear receivers and evaluate the effects of error 
propagation of the new cancellation methods against existing 
ones.n this paper we propose minimum mean squared error 
(MMSE) iterative successive parallel arbitrated decision feedback 
(DF) receivers for direct sequence code division multiple access 
(DS-CDMA) systems. We describe the MMSE design criterion 
for DF multiuser detectors along with successive, parallel and 
iterative interference cancellation structures. A novel efficient 
DF structure that employs successive cancellation with parallel 
arbitrated branches and a near-optimal low complexity user 
ordering algorithm are presented. The proposed DF receiver 
structure and the ordering algorithm are then combined with it- 
erative cascaded DF stages for mitigating the deleterious effects of 
error propagation for convolutionally encoded systems with both 
Viterbi and turbo decoding as well as for uncoded schemes. We 
mathematically study the relations between the MMSE achieved 
by the analyzed DF structures, including the novel scheme, with 
imperfect and perfect feedback. Simulation results for an uplink 
scenario assess the new iterative DF detectors against linear 
receivers and evaluate the effects of error propagation of the 
new cancellation methods against existing ones.I 



I. Introduction 

Multiuser detection has been proposed as a means to sup- 
press multi-access interference (MAI), increasing the capacity 
and the performance of CDMA systems The optimal 
multiuser detector of Verdu [2] suffers from exponential com- 
plexity and requires the knowledge of timing, amplitude and 
signature sequences. This fact has motivated the development 
of various sub-optimal strategies: the linear [3 | and decision 
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feedback (DF) ^ receivers, the successive interference can- 
celler |5| and the multistage detector f6l. Recently, Verdu 
and Shamai |7| and Rapajic [SJ et al. have investigated 
the information theoretic trade-off between the spectral and 
power efficiency of linear and non-linear multiuser detectors 
in synchronous AWGN channels. These works have shown 
that given a sufficient signal to noise ratio and for high 
loads (the ratio of users to processing gain close to one), DF 
detection has a substantially higher spectral efficiency than 
linear detection. For uplink scenarios, DF structures, which are 
relatively simple and perform linear interference suppression 
followed by interference cancellation, provide substantial gains 
over linear detection. 

Minimum mean squared error (MMSE) multiuser detectors 
usually show good performance and have simple adaptive 
implementation. In particular, when used with short or re- 
peated spreading sequences the MMSE design criterion leads 
to adaptive versions which only require a training sequence 
for estimating the receiver parameters. Previous work on DF 
detectors examined successive interference cancellation (|9l, 
ifTOl . ifTll . pai-allel interference cancellation O, JH], |[T5] 
and multistage or iterative DF detectors llT4l . llTSI . The DF 
detector with successive interference cancellation (S-DF) is 
optimal, in the sense that it achieves the sum capacity of the 
the synchronous AWGN channel ifTOl . The S-DF scheme is 
capable of alleviating the effects of error propagation despite 
it generally leads to non uniform performance over the users. 
In particular, the user ordering plays an important role in the 
performance of S-DF detectors. Studies on decorrelator DF 
detectors with optimal user ordering have been reported in 
IfTTI for imperfect feedback and in IfTSI for perfect feedback. 
The problem with the optimal ordering algorithms in [11], 
lfT2l is that they represent a very high computational burden 
for practical receiver design. Conversely, the DF receiver 
with parallel interference cancellation (P-DF) [il3il . [il4j . il5]| 
satisfies the uplink requirements, namely, cancellation of intra- 
cell interference and suppression of the remaining other-cell 
interference, and provides, in general, uniform performance 
over the user population even though it is more sensitive to 
error propagation. The multistage or iterative DF schemes 
presented in lfT4l . ifTSl are based on the combination of S- 
DF and P-DF schemes in multiple stages in order to refine 
the symbol estimates, resulting in improved performance over 
conventional S-DF, P-DF and mitigation of error propagation. 

In this work, we propose the design of MMSE DF detectors 
that employ a novel successive parallel arbitrated DF (SPA- 



DF) structure based on the generation of parallel arbitrated 
branches. The motivation for the novel DF structures is to 
mitigate the effects of error propagation often found in P-DF 
structures llT3l . llT4l . ifTSl . The basic idea is to improve the 
S-DF structure using different orders of cancellation and then 
select the most likely estimate. A near-optimal user ordering 
algorithm is described for the new SPA-DF detector structure 
and is compared to the optimal user ordering algorithm, which 
requires the evaluation of Kl different cancellation orders. 
The results in terms of performance show that the SPA-DF 
structure with the suboptimal ordering algorithm can achieve 
a performance very close to that of the S-DF with optimal 
ordering. Furthermore, the new SPA-DF scheme is combined 
with iterative cascaded DF stages, where the subsequent stage 
uses S-DF, P-DF or the new SPA-DF system to refine the 
symbol estimates of the users and combat the effects of 
error propagation. The performance of the proposed SPA- 
DF scheme and the sub-optimal ordering algorithm and their 
combinations with other schemes in a multistage detection 
structure is investigated for both uncoded and convolutionally 
encoded systems with Viterbi and turbo decoding. 

This paper is structured as follows. Section II briefly de- 
scribes the DS-CDMA system model. The MMSE decision 
feedback receiver filters are described in Section III. Sections 

IV is devoted to the novel SPA-DF scheme, the near-optimal 
user ordering algorithm and the combination of the SPA- 
DF detector with iterative cascaded DF stages and Section 

V details the proposed SPA-DF receiver for convolutionally 
coded systems with Viterbi and turbo decoding. Section VI 
presents and discusses the simulation results and Section VII 
draws the concluding remarks of this paper. 

II. DS-CDMA SYSTEM MODEL 

Let us consider the uplink of a symbol synchronous binary 
phase-shift keying (BPSK) DS-CDMA system with K users, 
N chips per symbol and Lp propagation paths. It should be 
remarked that a synchronous model is assumed for simplicity, 
although it captures most of the features of more realistic 
asynchronous models with small to moderate delay spreads. 
The baseband signal transmitted by the fc-th active user to the 
base station is given by 



(1) 



where bk{i) E {±1} denotes the i-th symbol for user k, the 
real valued spreading waveform and the amplitude associated 
with user k are Sk{t) and A^, respectively. The spreading 
waveforms are expressed by Sk{t) — o.k{i)4'{t — iTc), 

where ak{i) G {±1/^/N}, <l){t) is the chip waveform, Tc 
is the chip duration and N = T/Tc is the processing gain. 
Assuming that the receiver is synchronised with the main path, 
the coherently demodulated composite received signal is 

K Lp-l 

'^(*) = hk,i{t)^k{t - Tk,i) + n{t) (2) 

k=l 1=0 

where hk.i{t) and ; are, respectively, the channel coefficient 
and the delay associated with the /-th path and the fc-th user. 



Assuming that ; = IT^, the channel is constant during 
each symbol interval, the spreading codes are repeated from 
symbol to symbol and the receiver is synchronized with the 
main path, the received signal r{t) after filtering by a chip- 
pulse matched filter and sampled at chip rate yields the M- 
dimensional received vector 



r(*) 



K 



fc=l 



Akhk(i)Ckh.k{i) + Akhk{i - l)Cfehfe(i - 1) 
+ Afe6fc(i + l)Cfchfe(i + l) + n(i) (3) 

K 

= (^fe6fe(«)Pfc(j) + + n(i) 

fc=i 

where M = N + Lp — 1, n{i) = [ni{i) . . . nM{i)]'^ is the 
complex gaussian noise vector with E[n{i)n^ (i)] = ct^I, (.)"^ 
and (.)^ denote transpose and Hermitian transpose, respec- 
tively, E[.] stands for ensemble average, bk{i) E {±1 + jO} 
is the symbol for user fc, the amplitude of user k is Ak, the 
user fc channel vector is hk{i) ~ [hkfl{i) ■ ■ ■ hk^L with 
hk.i{i) = hk^i{iTc) for I = 0, . . . , Lp ~ 1, the ISI is given by 
r]k{i) = Akbk{i - l)Cfchfc(i - 1) + Akbk{i + l)Ckhk{i + 1) 
and assumes that the channel order is not greater than N, 
i.e. Lp — 1 < N, si 



.ak{N)]'^ is the signature 
sequence for user k and Pk{i) = Cfchfc(i) is the effective 
signature sequence for user fc, the M x Lp convolution matrix 
Cfe contains one-chip shifted versions of and the M x Lp 
matrices Cfc and Ck with segments of Sk have the following 
structure 



■afc(l) 

■ afc(l) 

akiN) \ 

ak{N) 






afc(l) 



Ck = 



■■• ak{N\ 

■0 ak{N) ... ak{N-Lp + l) 





















Ck = 









afc(l) 
ak{Lp - 1) 



ak{N) 






0' 



afc(l) 0. 



The MAI comes from the non-orthogonality between the 
received signature sequences, whereas the ISI span Ls depends 



on the length of the channel response, which is related to the 
length of the chip sequence. For Lp = 1, Lg = 1 (no ISI), 

for 1< Lp < N,Ls ^2, for N < Lp < 2N, L^ = 3. 

III. MMSE Decision Feedback Receivers 

Let us describe in this section the design of syn- 
chronous MMSE decision feedback detectors. The input to 
the hard decision device corresponding to the ith symbol is 
z(i) — W^(i)r(i) — F^{i)h{i), where the input z(i) — 
[zi{i) ... ZKii)f, W(i) = [wi ... wk] is M X K the 
feedforward matrix, b(i) = [bi{i) . . . bK{i)Y' is the K y. 1 
vector of estimated symbols, which are fed back through the 
K X K feedback matrix F(i) — . . . fxii)]- Generally, 

the DF receiver design is equivalent to determining for user k 
a feedforward filter Wfc(i) with AI elements and a feedback 
one ffc(i) with K elements that provide an estimate of the 
desired symbol: 



z,(z)=wf(z)r(z)-ffWbW , 



fc-l,2,...,if (4) 



where h{i) = sgn[5R(W^r(i))] is the vector with initial 
decisions provided by the linear section, Wfe and are 
optimized by the MMSE criterion. In particular, the feedback 
filter ffc(i) of user k has a number of non-zero coefficients 
corresponding to the available number of feedback connections 
for each type of cancellation structure. The final detected 
symbol is: 

=sgn(5R[zfcW]) = sgn(5R [wf (*)r(z) - ff (*)b(z) 



(5) 

where the operator (.)^ denotes Hermitian transpose, 5R(.) 
selects the real part and sgn(.) is the signum function. 

To describe the optimal MMSE filters we will initially 
assume perfect feedback, that is b = b, and then will 
consider a more general framework. Consider the following 
cost function: 



Jmse — E 



|6,.(z)-wfrW+ffb(z)|^ 



(6) 



Let us divide the users into two sets, similarly to lfT4l 

D — {j : bj is fed back } (7) 
U = {.]:.](^ D} (8) 

where the two sets D and U correspond to detected and 
undetected users, respectively. Let us also define the matrices 
of effective spreading sequences P = [pi ... Pk], P_d = 
[pi ... pd] and Pjj = [pi ... pu]- The minimization of 
the cost function in (6) with respect to the filters and ffc 
yields: 

(9) 



Wfc = Pk 

ffc = Pgwfe 



(10) 



where the associated covariance matrices are R = 
E[r{i)r"{i)] = PP" + aH, Ry = P^/P^ + ct^i ^ 
R — P£)P|^. Thus, assuming perfect feedback and that user k 
is the desired one, the associated MMSE for the DF receiver 
is given by: 

J MMSE = erg - pf Rj7^Pfc (11) 



where erg = £'[|6^,(i)|]. The result in (11) means that in the 
absence of error propagation, the MAI in set D is eliminated 
and user k is only affected by interferers in set U . 

For the successive interference cancellation DF (S-DF) 
detector , we have for user k 

D = {1, ... U = {k, ... ,K} (12) 

where the filter matrix F(i) is strictly upper triangular. The 
S-DF structure is optimal in the sense of that it achieves 
the sum capacity of the synchronous CDMA channel with 
AWGN |10|. In addition, the S-DF scheme is less affected 
by error propagation although it generally does not provide 
uniform performance over the user population. In order to 
design the S-DF receivers and satisfy the constraints of the 
S-DF structure, the designer must obtain the vector with initial 
decisions h{i) = sgn[3fi(W^(i)r(i))] and then resort to the 
following cancellation approach. The non-zero part of the filter 
ffc corresponds to the number of used feedback connections 
and to the users to be cancelled. For the S-DF, the number 
of feedback elements and their associated number of non-zero 
filter coefficients in ffc (where k goes from the second detected 
user to the last one) range from 1 Xo K — 1. 

The parallel interference cancellation DF (P-DF) |14| re- 
ceiver can offer uniform performance over the users but it 
suffers from error propagation. For the P-DF in a single cell, 
we have lfT4l 

D = {1, ... ,fc-lfc + l, ...,K}, U = {k} (13) 



Wfc 



Pfe 



(14) 



The MMSE associated with the P-DF system is obtained by 
substituting Rjy — R — PdP% into (9), which yields: 



JmMSE = O-g - pf (Pfcpf + Cr^I) Vfc = 



(15) 



where for P-DF F(i) is full and constrained to have zeros 
along the diagonal to avoid cancelling the desired symbols. In 
order to design P-DF receivers and satisfy their constraints, 
the designer must obtain the vector with initial decisions 
b(i) — sgn[3fi(W-^(i)r(i))] and then resort to the following 
cancellation approach. The non-zero part of the filter ffc 
corresponds to the number of used feedback connections and 
to the users to be cancelled. For the P-DF, the feedback 
connections used and their associated number of non-zero filter 
coefficients in ffc are equal io K—1 for all users and the matrix 
F(i) has zeros on the main diagonal to avoid cancelUng the 
desired symbols. 

Now let us consider a more general framework, where the 
feedback is not perfect. The minimization of the cost function 
in (4) with respect to Wfc and ffc leads to the following filter 
expressions: 

Wfc = R-i(pfc + Bffc) (16) 



ffc = {E[hh"])-'B"^k ~ B^wfc 



(17) 



where ii^[bb^] w I for small error rates and B = 
E[Y{i)h^ {i)]. The associated MMSE for DF receivers subject 



to i^[bt3^] « I and imperfect feedback is approximately given 
by 

Jmmse ~ ctI - pf R^'pfc - pf R^'Bffc (18) 

In Appendix I we show that the expression in (18) equals (11) 
under perfect feedback, and provide several other relationships 
between DF structure with and without perfect feedback. Note 
that the MMSE associated with DF receivers that are subject 
to imperfect feedback depends on the matrix B — E[Yh^], 
that under perfect feedback equals Pjj, and the feedback 
filter ffc or set of filters F. Specifically, if we choose a given 
structure for F this approach will lead to different methods 
of interference cancellation and performance improvements 
for the DF detector as compared to linear detection. The 
motivation for our work is to investigate alternative methods 
of finding structures for F that provide enhanced performance. 



interference suppression followed by SIC and yields improved 
starting points as compared to matched filters. Note that our 
approach does not require signal reconstruction as the PASIC 
in lfT6l because the MMSE filters automatically compute the 
coefficients for interference cancellation. 



M«l MsK K=K 



select 2 [i| that 

1^ 1 
minimizes e, [i] 



where zji] - "r(i| - |Mjr[i]||^ b(,) 



IV. Successive Parallel Arbitrated DF and 
Iterative Detection 

In this section, we present a novel interference cancellation 
structure and describe a low complexity near-optimal ordering 
algorithm that employs different orders of cancellation and 
then selects the most likely symbol estimate. The proposed 
ordering algorithm is compared with the optimal user ordering 
algorithm, which requires the evaluation of K\ different can- 
cellation orders and turns out to be too complex for practical 
use. The new receiver structure, denoted successive parallel 
arbitrated DF (SPA-DF) detection, is then combined with 
iterative cascaded DF stages |14], ifTSl to further refine the 
symbol estimates. The motivation for the novel DF structures 
is to mitigate the effects of error propagation often found 
in P-DF structures [[141 . [15], that are of great interest for 
uplink scenarios due to its capability of providing uniform 
performance over the users. 

A. Successive Parallel Arbitrated DF Detection 

The idea of parallel arbitration is to employ successive 
interference cancellation (SIC) to rapidly converge to a local 
maximum of the likelihood function and, by running parallel 
branches of SIC with different orders of cancellation, one 
can arrive at sufficiently different local maxima ||T6l . The 
goal of the new scheme, whose block diagram is shown in 
Fig. 1, is to improve performance using parallel searches and 
to select the most likely symbol estimate. The idea of the 
ordering algorithm is to employ SIC for different branches 
based on the power of the users to rapidly converge to a 
local maximum of the likelihood function and, on the basis 
of the euclidean distance, our approach selects the most likely 
estimate. In order to obtain the benefits of parallel search, the 
candidates should be arbitrated, yielding different estimates 
of a symbol. The estimate of a symbol that has the highest 
likelihood is then selected at the output. Unlike the work 
of Barriac and Madhow [16| that employed matched filters 
as the starting point, we adopt MMSE DF receivers as the 
initial condition and the euclidean distance for selecting the 
most likely symbol. The concept of parallel arbitration is thus 
incorporated into a DF detector structure, that applies linear 



Fig. 1 . Block diagram of the proposed SPA-DF receiver. 



Following the schematic of Fig. 1, the user k output of the 
parallel branch / (Z = 1, . . . , L) for the SPA-DF receiver 
structure is given by: 



4W=wf(z)r(z)-[M,F]fbW 



(19) 



where h{i) = sgn[5R(W^r(i))] and the matrices M; are 
permutated square identity (I^) matrices with dimension K 
whose structures for an L = 4-branch SPA-DF scheme are 
given by: 



Ml = Ik, M2 











K/2 '-Kjl 

Ia72 0_ft:/2 



Kj^.ZKj^ 
f-K/4 



, M4 = 







K/4,3K/4 



(20) 



where 0„i „ denotes an m x n-dimensional matrix full of 
zeros and the structures of the matrices Mi correspond to 
phase shifts regarding the cancellation order of the users. 
The purpose of the matrices in (20) is to change the order 
of cancellation. When M = I the order of cancellation 
is a simple successive cancellation (S-DF) based upon the 
user powers (the same as f9\, fiQ\). Specifically, the above 
matrices perform the cancellation with the following order 
with respect to user powers: Mi with indices 1,...,K; 
M2 with indices K/A, K/A + 1, . . . , 1, . . . , K/A - VMs 
with indices K/2, K/2 + 1, . . . , K,l, . . . , K/2 - 1; M4 with 
K, . . . ,1 (reverse order). The proposed ordering algorithm 
shifts the ordering of the users according to K/B, where 
B is the number of parallel branches. The rationale for this 
approach is to shift the ordering and attempt to benefit a given 
user or group of users for each decoding branch. Following 
this approach, a user that for a given ordering appears to 
be in an unfavorable position can benefit in other parallel 
branches by being detected in a more favorable situation. 
For more branches, additional phase shifts are applied with 
respect to user cancellation ordering. Note that different update 



orders were tested although they did not result in performance 
improvements. 

The final output ^{(i) of the SPA-DF detector chooses the 
best estimate of the L candidates for each symbol interval i 
as described by: 



sgn 



3?( 



arff min euii) 



(21) 



where the best estimate is the value 

- _ 4(^)1 and h^i^' 



final decisions \3^p {i) 



z\{i) that minimizes 
forms the vector of 
... Sy^^Ci)]^. The number 
of parallel branches L that yield detection candidates is a 
parameter that must be chosen by the designer In this context, 
the optimal ordering algorithm conducts an exhaustive search 
and is given by 



sgn 



5R( 



arg min eAi] 

1<1<K\ 



(22) 



where the number of candidates is L ^ Kl and is clearly 
very complex for practical systems. Our studies indicate that 
L = A achieves most of the gains of the new structure and 
offers a good trade-off between performance and complexity. 
The SPA-DF system employs the same filters, namely W and 
F, of the traditional S-DF structure and requires additional 
arithmetic operations to compute the parallel arbitrated can- 
didates. A discussion of the approximate MMSE attained by 
the proposed SPA-DF structure is included in Appendix 11, 
whereas expressions for the MMSE of the optimal ordering 
algorithm are given in Appendix III. As occurs with S-DF 
receivers, a disadvantage of the SPA-DF detector is that it 
generally does not provide uniform performance over the user 
population. In a scenario with tight power control successive 
techniques tend to favor the last detected users, resulting in 
non-uniform performance. To equalize the performance of the 
users an iterative technique with multiple stages can be used. 

B. Iterative Successive Parallel Arbitrated DF Detection 

In IIT4II . Woodward et al. presented an iterative detector 
with an S-DF in the first stage and P-DF or S-DF structures, 
with users being demodulated in reverse order, in the second 
stage. The work of [14J was then extended to account for 
coded systems and training-based reduced-rank filters llT5l . 
Here, we focus on the proposed SPA-DF receiver and the low 
complexity near-optimal ordering algorithm, and combine the 
SPA-DF structure with iterative detection. An iterative receiver 
with hard-decision feedback is defined by: 



(23) 



where the filters W and F can be S-DF or P-DF structures, and 
b"'(z) is the vector of tentative decisions from the preceding 
iteration that is described by: 



h^^\i) = sgn(: 



3? 



sgn 



3? 



(0 



m > 1 



(24) 
(25) 



where the number of stages m depends on the application. 
More stages can be added and the order of the users is reversed 
from stage to stage. 



To equalize the performance over the user population, we 
consider a two-stage structure. The first stage is an SPA-DF 
scheme with filters and F^. The tentative decisions are 
passed to the second stage, which consists of an S-DF, an P- 
DF or an SPA-DF detector with filters and F^, that are 
computed similarly to and F^ but use the decisions of the 
first stage. The resulting iterative receiver system is denoted 
ISPAS-DF when an S-DF scheme is deployed in the second 
stage, whereas for P-DF filters in the second stage the overall 
scheme is called ISPAP-DF. The output of the second stage 
of the resulting scheme is: 



.(2) 



(i) = [MW2 (i)]f r (i ) - [MF^ (i )]f b(2) (t) (26) 



where Zj is the jth component of the soft output vector 
z, M is a square permutation matrix with ones along the 
reverse diagonal and zeros elsewhere (similar to M4 in (18)), 
denotes the jth column of the argument (a matrix), 
and b™'{i) = sgn[^{zj^{i))]. The third proposed iterative 
scheme is denoted ISPASPA-DF and corresponds to an SPA- 
DF architecture employed in both stages. The output of the 
kh branch of its second stage is: 



(2) 



(i) = [MW\i)]fr{i) - [MiF2(i)]f b(2)(j) (27) 



(2)^A _ 



where bj '(i) 



sgn 



and 



bk{i) — zi,j{i)\. Note that the users in the second stage are 
demodulated successively and in reverse order relative to the 
first branch of the SPA-DF structure (a conventional S-DF). 
The role of reversing the cancellation order in successive 
stages is to equalize the performance of the users over the 
population or at least reduce the performance disparities. 
Indeed, it provides a better performance than keeping the same 
ordering as the last decoded users in the first stage tend to be 
favored by the reduced interference. The rationale is that by 
using these benefited users (last decoded ones) as the first ones 
to be decoded in the second stage, the resulting performance 
is improved. Additional stages can be included, although our 
studies suggest that the gains in performance are marginal. 
Hence, the two-stage scheme is adopted for the rest of this 
work. 



V. Successive Parallel Arbitrated DF and 
Iterative Detection for Coded Systems 

This section is devoted to the description of the proposed 
SPA-DF detector and iterative detection schemes for coded 
systems which employ convolutional codes with Viterbi and 
turbo decoding. Specifically, we present iterative DF detectors 
based on the proposed SPA-DF structure which exploits user 
ordering and combine the SPA-DF with either the S-DF, the 
P-DF or another SPA-DF in the second stage. We show that 
a reduced number of turbo iterations can be used with the 
proposed iterative detector when a near-optimal user ordering 
is employed and that savings in transmitted power are also 
obtained as compared to previously reported turbo detectors 

flu-iiia. 



A. Convolutional Codes with Viterbi Decoding 

The structure shown in Fig. 1 can be extended to coded 
systems by including a decoder after the selection unit and 
before the sheer and an encoder that processes the refined es- 
timates before the feedback filter F{i). For the proposed SPA- 
DF receiver structure, users are decoded successively with the 
aid of the Viterbi algorithm for each parallel arbitrated branch 
and then reencoded with a convolutional encoder and used 
for interference cancellation. The motivation for the proposed 
encoded structure is that significant gains can be obtained from 
iterative techniques with soft cancellation methods and error 
control coding [T7l-f23| and from efficient receivers structures 
and ordering algorithms such as the novel SPA-DF detector. 
The decoding process of the existing S-DF, P-DF and iterative 
schemes, namely the ISS-DF and the ISP-DF, are explained in 
|IT4 |. The decoding of the proposed iterative detection schemes 
that employ the SPA-DF detector (ISPAS-DF, ISPAP-DF and 
ISPASPA-DF) resembles the uncoded case, where the second 
stage benefits from the enhanced estimates provided by the 
first stage that now employs convolutional codes followed by 
a Viterbi decoder with branch metrics based on the Hamming 
distance. Specifically, the output of the second stage of the 
resulting scheme for coded systems is: 

zf\i) = [MW^{i)]fr(i) - [MF^i)]W\i) (28) 



where 



bf^ for I > j 
U^^ for I < j 



(29) 



where [h'^^'>{i)]i is the Ith entry of the decision vector b'^^^(i). 
Accordingly, the output of the second stage of the ISPASPA- 
DF (the SPA-DF architecture is employed in both stages) is 
desbribed by: 

zj^({) - [MW2(j)]f r(j) - [M,F2(j)]f b(2)(*) (30) 



where b''j^\i) = 



sgn 



argmini<;<L eij (i) 



and 



\bf{i)~zi^,{z)\ for/>j 
\bf\z) - zi,,{z)\ for/<j 



(31) 



B. Iterative Turbo Receiver and Decoding 

A CDMA system with convolutional codes being used at 
the transmitter and the proposed iterative SPA-DF receiver 
with turbo decoding is illustrated in Fig. 2. The proposed 
iterative (turbo) receiver structure consists of the following 
stages: a soft-input-soft-output (SISO) SPA-DF detector and 
a maximum a posteriori (MAP) decoder These stages are 
separated by interleavers and deinterleavers. Specifically, soft 
outputs from the SPA-DF are used to estimate likelihoods 
which are interleaved and input to the MAP decoder for the 
convolutional code. The MAP decoder computes a posteriori 
probabilities (APPs) for each user's encoded symbols, which 
are used to generate soft estimates. These soft estimates are 
subsequently used to update the SPA-DF filters, de-interleaved 
and fed back through the feedback filter This process is then 
iterated. 



The proposed SPA-DF detector yields the a posteriori log- 
likelihood ratio (LLR) of a transmitted symbol (+1 or —1) for 
every code bit of each user as given by 



Ai[6feC0] =lo: 



P[bk{t) = +l|r(z)] 



k = l,...,K. (32) 



'PM^) = -l\v{^)V 
Using Bayes' rule, the above equation can be written as 

Ai[bk[i)\ = log „ — + lor- 



' P[r{^)\b,{^) = -I] 



(33) 



where X^lbk^i)] = log pj|^^|^j~^!|j represents the a priori 
LLR of the code bit bk{i), which is computed by the MAP 
decoder of the fcth user in the previous iteration, interleaved 
and then fed back to the SPA-DF detector Note that the 
superscript p denotes the quantity obtained in the previous 
iteration. Assuming equally likely bits, for the first iteration 
we have = for all users. The first term in (33), 

i.e. Xi[bkii)] = lQg p[r(i)|Mi{l-i] ' represents the extrinsic 
information yielded by the SISO SPA-DF detector based on the 
received data r(i), the prior information about the code bits 
of all other users X2[bi{i)],l ^ k and the prior information 
about the code bits of the fcth user other than the ith bit. The 
extrinsic information Xi[bk{i)] provided by the MAP decoder 
is then de-interleaved and fed back into the MAP decoder of 
the A;th user as the a priori information in the next iteration. 

Based on the prior information A^[6fc(«)] and the trellis 
structure of the code, the fcth user's MAP decoder computes 
the a posteriori LLR of each code bit as described by 



^2[bk{i)] = lo, 



P[5fe(i) = +l|AP[6fc(i); decoding] 



PMi) = -l\XP[bk{i); decoding] 
A2[6feW] + AP[6fe(i)], k^l,...,K. 



(34) 



From the above equality, it is seen that the output of the MAP 
decoder is the sum of the prior information X^[bk{i)] and the 
extrinsic information X2[bk{i)] yielded by the MAP decoder 
This extrinsic information is the information about the code bit 
bk{i) obtained from the prior information about the other code 
bits X^[bk{j)], j ^ i |22|. The MAP decoder also computes 
the a posteriori LLR of every information bit, which is used to 
make a decision on the decoded bit at the last iteration. After 
interleaving, the extrinsic information yielded by the K MAP 
decoders X2[bh{i)], k — 1,...,K is fed back to the SPA- 
DF detector, as the prior information about the code bits of 
all users in the subsequent iteration. At the first iteration, the 
extrinsic information Xi[bk{i)] and X2[bk{i)] are statistically 
independent and as the iterations are computed they become 
more correlated and the improvement due to each iteration is 
gradually reduced. 

For the purpose of MAP decoding, we assume that the 
interference plus noise at the output of the subtractor in Fig. 2 
(b), which corresponds to z(i), is Gaussian. This assumption 
is reasonable when there are many active users, has been used 
in previous works lfT5l . ll22l - ll23l and provides an efficient and 
accurate way of computing the extrinsic information. Thus, for 
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Fig. 2. Block diagram of the proposed system with the SPA-DF detector and turbo decoding. 



the fcth user and mth iteration the soft output of the SPA-DF 
detector is written as 



zt\i) = vr%{i)+er{i 



(35) 



where V^"^\i) is a scalar variable equivalent to the fcth user's 
amplitude and ^^"'(1) is a Gaussian random variable with 



variance a 



Since we have 



v^-^ii)=E[biii)zr{i)] 



(36) 



and 



(i)-Vt\i)h(i)\'] (37) 



the designer can obtain the estimates V^"^\i) and 
via the corresponding sample averages over the packet trans- 
mission. These estimates are used to compute the detector a 
posteriori probabiUties P[bk{i) = ±l\zl^\i)] which are de- 
interleaved and input to the MAP decoder for the convolutional 
code. In what follows, we assume that the MAP decoder 
generates APPs P[bk{i) = ±1], which are used to compute 
the input to the feedback filter ffe(i). From (35) the extrinsic 



information delivered by the soft output SPA-DF is given by 
Ai[6fe(i)] = log 



(4 ) 



P[z^^^{^)\bu{^)^-l] 



(m)\2 



+ 



4|! 



2V, 



(rn) (m) 



(0 



(38) 



The SPA-DF turbo detector chooses the best estimate of the 
L candidates for the mth turbo decoding iteration as: 

= arg min ej,(i) 



KKL 



(39) 



where the best estimate is the value z\,{i) which minimizes 
e{{i) = \bk{i)-zi{i)\. 

C. Extensions 

Here, we briefly comment on how the proposed receiver 
structures can be extended to take into account asynchronous 
systems, dynamic scenarios, other types of communications 
systems and multiple access techniques. 

For asynchronous systems with large relative delays 
amongst the users, the observation window of each user should 
be expanded in order to consider an increased number of 
samples derived from the offsets amongst users. Alternatively 
for small relative delays amongst users, the designer can resort 



to chip oversampling to compensate for the random timing 
offsets. These remedies imply in augmented fiher lengths and 
consequently increased computational complexity. To alleviate 
for the increase in filter length and the increased amount of 
training, the designer can resort to reduced-rank estimation 
techniques such as the Multistage Wiener Filter, as in or 
to a new very promising technique that employs interpolated 
FIR filters fl5\. 

An extension with low complexity turbo schemes such as 
the one in ll26l are also possible with the structures presented 
in this paper For dynamic channels that are subject to fading, 
the designer can rely on adaptive signal processing techniques 
and make the proposed detector structures adaptive in order 
to track the variations of the channel and the interference. 
This includes some modifications for CDMA systems with 
long codes, which require a different approach for estimating 
the covariance observation matrix R due to the loss of the 
cyclostationarity. 

Finally, we also remark that the proposed detection schemes 
can be deployed for narrow-band systems with multiple trans- 
mitter and receiver antennas, exploiting the capacity improve- 
ments of spatial multiplexing. 

VI. Simulations 

In this section, we evaluate the performance of the iterative 
arbitrated DF structures introduced in Section IV and com- 
pare them with other existing structures. Due to the extreme 
difficulty of theoretically analyzing such scheme, we adopt 
a simulation approach and conduct several experiments in 
order to verify the effectiveness of the proposed techniques. 
In particular, we have carried out experiments to assess the bit 
error rate (BER) performance of the DF receivers for different 
loads, channel profiles, and signal to noise ratios (Ei,/Nq). 
The DS-CDMA system employs random generated spreading 
sequences of length = 16, = 32 and N = 64, has 
perfect power control and use statistically independent random 
channels with Lp = 3, whose coefficients hkj are taken, 
for each run, from uniform random variables between — 1 
and 1, and which are normalized so that X](=i ^1 1 ^ ^- 
should be remarked that the existence of multipath creates 
an error floor for the multiuser receivers, making it more 
difficult the interference suppression of associated users. Note 
also that given the performance of current power control 
algorithms, ideal power control is not far from a realistic 
situation. The matrices used in (14) and (15) are estimated 

by R(*) - iELir(Or^(0 and = \El=ir{l)h" (l). 
For coded systems, we employ a convolutional code with rate 
R = 3/4 and constraint length 6 which can be found in 
ll24l . In particular, for turbo decoding plots we used S-random 
interleavers with block size equal to 256. In the following 
experiments, averaged over 200 runs for uncoded systems, 
over 2000 for encoded systems with Viterbi decoding and 
over 20000 for turbo decoded schemes, it is indicated the 
receiver structure (linear or decision feedback (DF)). Amongst 
the different DF structures, we consider: 

. S-DF: the successive DF detector of M, EO). 
. P-DF: the paraflel DF detector of IH, lfT4l . 



• ISS-DF: the iterative system of Woodward et al. lfT4l with 
S-DF in the first and second stages. 

• ISP-DF: the iterative system of Woodward et al. lfT4l with 
S-DF in the first stage and P-DF in the second stage. 

• SPA-DF: the proposed successive parallel arbitrated re- 
ceiver 

• ISPAS-DF: the proposed iterative detector with the novel 
SPA-DF in the first stage and the S-DF in the second 
stage. 

• ISPAP-DF: the proposed iterative receiver with the SPA- 
DF in the first stage and the P-DF in the second stage. 

• ISPASPA-DF: the proposed iterative receiver with the 
SPA-DF in the first and second stages. 
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Fig. 3. BER performance versus number of symbols. 

Let us first consider the proposed SPA-DF, evaluate the 
number of arbitrated branches that should be used in the 
ordering algorithm and account for the impact of additional 
branches upon performance. In addition to this, we carry out 
a comparison of the proposed low complexity user ordering 
algorithm against the optimal ordering approach, briefly de- 
scribed in Section IV. A, that tests K\ possible branches and 
selects the most likely estimate. We designed the novel DF 
receivers with L = 2, 4, 8 parallel branches and compared their 
BER performance versus number of symbols with the existing 
S-DF and P-DF structures, as depicted in Fig. 3. The results 
show that the proposed low complexity ordering algorithm 
achieves a performance close to the optimal ordering, whilst 
keeping the complexity reasonably low for practical utilization. 
Furthermore, the performance of the new SPA-DF scheme with 
L = 2,4,8 outperforms the S-DF and the P-DF detector It 
can be noted from the curves that the performance of the 
new SPA-DF improves as the number of parallel branches 
increase. In this regard, we also notice that the gains of 
performance obtained through additional branches decrease as 
L is increased, resulting in marginal improvements for more 
than L = A branches. For this reason, we adopt L = 4 for the 
remaining experiments because it presents a very attractive 
trade-off between performance and complexity. 



A performance comparison in terms of BER of the proposed 
DF structures, namely SPA-DF, ISPAP-DF, ISPAS-DF and 
ISPASPA-DF with existing iterative and conventional DF and 
linear detectors is illustrated in Figs. 4 to 5, for uncoded 
systems and in Fig. 6, for convolutionally coded systems. In 
particular, we show BER performance curves versus Eb/No 
and number of users (K) for the analyzed receivers. The results 
for a system with TV = 32, depicted in Fig. 4 indicate that 
the best performance is achieved with the novel ISPASPA-DF 
(the SPA-DF is employed in two cascaded stages), followed by 
the new ISPAP-DF the existing ISP-DF 1 141, the ISPAS-DF, 
the SPA-DF, the P-DF, the ISS-DF, the S-DF and the linear 
detector. Specifically, the ISPASPA-DF detector can save up to 
1.5 dB and support up to 4 more users in comparison with the 
ISP-DF (which is the best existing scheme) for the same BER 
performance. The ISPAP-DF scheme can save up to 1 dB and 
support up to 2 more users in comparison with the ISP-DF 
for the same BER performance. Moreover, the performance 
advantages of the ISPASPA-DF and ISPAP-DF systems are 
substantially superior to the other existing approaches. 
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Fig. 4. BER performance versus (a) Ed/Ng and (b) number of users (K). 

The results for a larger system with N = 64, illustrated in 
Fig. 5, corroborate the curves obtained for the smaller system 
in Fig. 4. In particular, the same BER performance hierarchy 
is observed for the detection schemes (except for the ISPAS- 
DF, that now outperforms the ISP-DF) and we notice some 
additional gains in performance for the proposed schemes 
over the existing techniques. Specifically, the ISPASPA-DF 
detector can save up to 1.8 dB and support up to 10 additional 
users in comparison with the ISP-DF for the same BER 
performance. The ISPAP-DF scheme can save up to 1.4 dB 
and support up to 8 more users in comparison with the ISP-DF 
for the same BER performance. Moreover, the performance 
advantages of the ISPASPA-DF and ISPAP-DF systems are 
even more pronounced over the other analyzed schemes for 
larger systems. 

The BER performance of the analyzed detection schemes 
was then examined for convolutionally encoded systems with 
Viterbi decoding, iV 32 and rate R = 3/4, as depicted in 
Fig. 6. The results corroborate those obtained for uncoded 




Fig. 5. BER performance versus (a) Ei,/No and (b) number of users (K). 



systems in Figs. 4 and 5, and indicate that the proposed 
ISPASPA-DF and ISPAP-DF detection schemes significantly 
outperform the remaining receiver structures. In particular, the 
ISPASPA-DF detector can support up to 8 additional users in 
comparison with the ISP-DF for the same BER performance, 
whereas the ISPAP-DF scheme can accomodate up to 6 more 
users in comparison with the ISP-DF for the same BER 
performance. It is worth noting that the linear and P-DF 
detectors experience performance losses for coded systems, 
relative to the other structures, as verified in llT4l and which 
is a result of the loss in spreading gain that increases the 
interference power at the output of the MMSE receiver. 
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Fig. 6. BER performance of a convolutionally coded system with R = 3/4 
versus (a) Eb/No and (b) number of users (K). 



The BER performance of the analyzed detection schemes 
was also investigated for convolutionally encoded systems with 
turbo decoding. In our studies with turbo receivers, we tested 
several code rates and found that R = 1/2 was unable to 
attain good performance for highly loaded systems, whereas 
R = 3/4: was powerful enough to obtain good performance 



even in fully loaded systems. For this reason, we adopted the 
rate i? = 3/4 for the remaining experiments with iterative 
decoders and considered a system with N — 32, as depicted 
in Fig. 7. The results corroborate those obtained for uncoded 
and encoded systems with Viterbi decoding in Figs. 5 and 
6, and indicate that the proposed ISPASPA-DF and ISPAP- 
DF detection schemes significantly outperform the remaining 
receiver structures. In particular, the ISPASPA-DF detector can 
approach the single user bound with only 4 iterations and 
offer a significant advantage over the existing detectors. In 
comparison with existing iterative DF detectors, the ISPASPA- 
DF can save up to 0.5 dB for the same BER performance, 
whereas it can accommodate a fully loaded system with only 
4 iterations and operating with only 4 dB with negligible 
performance degradation as the load is increased. 
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Fig. 8. BER performance of a turbo decoded system with R - 
number of iterations. 
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Fig. 7. BER performance of a turbo decoded system with ij = 3/4 versus 
(a) El,/ No and (b) number of users (K). 



In Fig. 8 it is illustrated the average BER performance 
of the detectors versus the number of iterations of the turbo 
decoder The plots show that the proposed ISPASPA-DF and 
the ISPAP-DF detectors achieve the single user bound with 
only 4 and 7 iterations, respectively, whereas the remaining 
detectors require more iterations to achieve this performance. 
This is an important feature of the proposed detectors as they 
can save considerable computational resources by operating 
with a lower number of turbo iterations. 

The last scenario, shown in Figs. 9, considers the individual 
BER performance of the users for both uncoded and convo- 
lutionally encoded systems with Viterbi decoding. From the 
curves, we observe that a disadvantage of S-DF relative to P- 
DF is that it does not provide uniform performance over the 
user population. We also notice that for the S-DF receivers, 
user 1 achieves the same performance of their linear receivers 
counterparts, and as the successive cancellation is performed 
users with higher indices benefit from the interference can- 
cellation. The same non-uniform performance is verified for 
the proposed SPA-DF, the existing ISS-DF and the novel 
ISPAS-DF and ISPASPA-DF Conversely, the new ISPAP-DF 
the existing P-DF and the existing ISP-DF provide uniform 



performance over the users which is an important goal for the 
uplink of DS-CDMA systems. In particular, the novel ISPAP- 
DF detector achieves the best uniform performance of the 
analyzed structures and is superior to the ISP-DF and to the P- 
DF, that suffers from error propagation. For coded systems, we 
notice that the performance of the proposed ISPASPA-DF and 
ISPAS-DF, and the existing ISS-DF and S-DF becomes very 
attractive for the users with indices greater than 5 (where the 
SIC-based schemes outperform the ISPAP-DF, the ISP-DF and 
the P-DF). This suggests the deployment of these structures for 
systems that rely on differentiated services, where the quality 
of service (QoS) can be made different for different groups of 
users. In this context and as an example, users with the first 
indices and poorer performance should be allocated to voice 
services, while the users with better performance should be 
designated to data transmission services that require improved 
QoS. 
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Fig. 9. BER performance versus user index for (a) an uncoded system (b) 
a convolutionally coded system with rate R = 3/4. 



VII. Conclusions 

A novel SPA-DF structure and a low complexity near- 
optimal ordering algorithm were presented and combined 
with iterative techniques for use with cascaded DF stages 
for mitigating the deleterious effects of error propagation. 
The proposed SPA-DF and iterative receivers for DS-CDMA 
systems were investigated in an uplink scenario and compared 
to existing schemes in the literature. The results for both 
uncoded and convolutionally encoded systems using Viterbi 
and turbo decoding show that the new detection schemes can 
offer considerable gains as compared to existing DF and Unear 
receivers, support systems with higher loads and mitigate the 
phenomenon of error propagation. 



Appendix 

In this Appendix, we provide some relationships between 
the MMSE attained by a decision feedback structure with 
perfect and imperfect feedback. Let us consider an altemative 
expression for the cost function in (4) for user k: 
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(40) 



Consider the expression for the feedforward filter = 
R-^(Pfe + Bffe) obtained in (16) and the expression for the 
feedback filter = Q-^B^w^ with Q = £;[bb^] in (17). 
By substituting the optimal MMSE expressions obtained in 
(17) into (16) for the filters we obtain an altemative expression 
for the feedback filter f^: 



ffe = D-iQ-^B^R 



Pfe 



(41) 



where D = (I — Q~^B^R~^B) and the above expression 
only depends on Q, B, R and p^. By inserting the expression 
Wfc = R~^(pfe -|- Bffc) and (41) into (40), we have for user k: 

Jmmse = o^- Pfe'^R~-^pfc - ffe^B''^R~^pfe - pf^R~^Bffe 

-iBffc + ff ffe 



ffB^R- 



al - pf R^Pfe pf R ^BQ-iD-iB^R-ipfc 



pf R^BQ^D^Q^B^R-ipfc 
pf R-^BQ^D^B^R-^BD-iQ-^B^R 



MMSE expression for user k is approximated by: 

Jmmse ^ (jI - pf R"'p)c - pf R-^BD-^B^R-ipfc 
-pfR^BD^B^R Vfc 
+ pf R^BD^(I B^R^B)D^B^R^pfe 
« al - pf R^Pfc pf R iBD-^B^R-ipfc 
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(43) 



The approximate expression obtained in (43) represents the 
MMSE attained by a general decision feedback structure that 
has imperfect feedback. The equation in (43) is a function of 
B, R and pfe, and is still dependent on the decisions. Let us 
now assume perfect feedback (b = b) and look at the filter 
expressions. Since = R~^(pfc -|-Bffc) and ffe = B^Wfe = 



p^R-i(pfe+Bffc) 
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(44) 



The approximate expression obtained in (43) has been signif- 
icantly simplified due to the assumption of perfect feedback 
and indicates that the MMSE for user fc is a function of Wfe. If 
we consider a decision feedback structure such as successive 
cancellation (S-DF), use the expression for the feedforward 
filter Wfe = R^^pfe, the MMSE for user k is approximately 
given by: 

Jmmse ~ tr^ — p^R^^^pfe (45) 



where the above result means that the MMSE attained by user 
k is proportional to the number of undetected users expressed 
by the covariance matrix R[/. If we consider a decision 
feedback structure such as parallel cancellation (P-DF), use 
the expression for the feedforward filter Wfe = R^^pfe = 
the MMSE for user k is approximately given by: 



Jmmse ^ - Pk (PfePfe + cr I) pfe 



(46) 
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At this point, it is convenient to adopt the judicious approx- 
imation Q = £^[bb^] « I, which is justified for moderate 
to low BER values. By using this approximation we have 
ffe w D-^B^R-ipfe, where D w (I - B^R-^B), and the 



Note that the above result corresponds to the single-user bound 
Pfcbecause we assume that all users (with perfect decision) had 
been fed back, as in P-DF. 

For imperfect feedback, the P-DF is known to be susceptible 
to error propagation, while the S-DF is more effective in 
combating these deleterious effects. The proposed SPA-DF 
^Q~^B^Rm'gi(gys several versions of S-DF in parallel and chooses 
the best estimate amongst these parallel branches, resulting 
in improved performance over the S-DF, as verified in our 
studies. Here, we mathematically discuss the MMSE of the 
SPA-DF, under the assumption of perfect feedback. If we 
consider the SPA-DF with L branches, we have L different 
groups of undetected users, namely, Ui, U2, . . . , Ul and the 



(42) 



associated expression for the feedforward filter = R^^^pfc, 
where I = 1,2, ... ,L. Therefore, the MMSE for user k is 
approximately given by: 

J MMSE ~ arg ^\n^{MSEui ) (47) 

where MSEui = (^t ^ Vk^ih^k and the above expression 
means that the MMSE attained by user k with the SPA- 
DF is at least equal to a standard S-DF (with L = 1 and 
approximate MMSE given by (45)). The approximate MMSE 
in (47) is also proportional to the number of undetected 
users expressed by the covariance matrix R[/, , but can benefit 
from different groups of undetected users, by selecting the 
undetected group of users that yield smaller MSE, resulting in 
better performance. Indeed, the MMSE of the proposed SPA- 
DE structure in (47) is upperbounded by the MMSE of the 
standard S-DF detector given through (45). 

Here, we mathematically discuss the MMSE of S-DF de- 
tectors with the optimal ordering algorithm. If we consider 
an exhaustive search over all the possible orderings for an 
S-DF, we have K\ different groups of undetected users or 
equivalently K\ possible orderings. The optimal ordering S- 
DF can be seen as a generalisation of the proposed SPA-DF 
structure in which the number of branches is equal to K\. 
Mathematically, for the case of imperfect decisions we have 
for the optimal ordering S-DF the following expression 

Jmmse ~ arg min {Jmse,i) (48) 

where 

Jmse,i = i^b - Pk,i'R~^Pk - ffc^/B^R~^pfc - p^K~^Bfk,i 

(49) 

The expression in (49) is similar in form to the first line of 
(42) but depends on the ordering I and the associated feedback 
filter fk^i. In the case of perfect feedback, the correspond- 
ing expression for the feedforward filter is Wfc = R^^^pfc, 
where I = 1,2, . . . ,K\ and we have K\ different groups of 
undetected users, namely, U\, U2, • • ■ , Uk\- Therefore, the 
MMSE for user k is approximately given by 

Jmmse ~ arg ^uim^i^MSEui) (50) 

where MSEjj^ = cr^ — Vk'^ih^f^ above expression 

means that the MMSE attained by user k with the optimal 
ordering is at least equal to a standard S-DF (with L =1 and 
approximate MMSE given by (45)). The approximate MMSE 
in (50) is indeed proportional to the number of undetected 
users expressed by the covariance matrix R;/; . The key point is 
that the designer searches for all possible groups of undetected 
users and selects the one which yields the smallest MSE, 
resulting in better performance. The main problem is that 
as K increases the complexity becomes prohibitive and its 
implementation impractical. 
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